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Abstract 

We describe the basis of a matrix ordering heuristic for improving the incom- 
plete factorization used in preconditioned conjugate gradient techniques applied to 
anisotropic PDE’s. Several new matrix ordering techniques, derived from well-known 
algorithms in combinatorial graph theory, which attempt to implement this heuris- 
tic, are described. These ordering techniques are tested against a number of matrices 
arising from linear anisotropic PDE’s, and compared with other matrix ordering tech- 
niques. A variation of ROM is shown to generally improve the quality of incomplete 
factorization preconditioners. 

Keywords: Preconditioned conjugate gradient, preconditioner, matrix ordering, weighted 
graph 
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1 Introduction 

Preconditioned conjugate gradient (PCG) methods have been proven to be ro- 
bust and competitive techniques for the solution of matrices arising from PDE’s 
in a number of applications [11, 4, 17, 2, 14, 5, 20, 33, 31]. The successful appli- 
cation of PCG methods depends to a great extent on the formation of a rapidly 
convergent preconditioner. A number of studies have examined the effect of ma- 
trix ordering on the quality of preconditioners based on incomplete factorization 
[7, 8, 9, 13, 14, 26, 12]. In [7, 8, 9] evidence was presented to demonstrate how 
matrix ordering can have a profound effect on the quality of preconditioners. A 
heuristic was described that was shown to produce a good matrix ordering. This 
study examines the use of efficient algorithms from combinatorial graph theory 
which implement that matrix ordering heuristic. 

By way of background, we refer the reader to [34, 8, 21] for an outline of level 
based, incomplete, L/U factorization (denoted ILU(/), where l is the level of 
fill retained in the preconditioner). We will be referring to matrices as weighted 
graphs, where the matrix rows represent vertices, and the graph edges are encoded 
in the off- diagonals, the magnitude of the off-diagonal coefficients providing the 
“strength” of the connections. The reader may wish to review [27, 29, 18, 7] for 
relevant information on this view of matrices. 

Duff and Meurant [13] studied a large number of preconditioner orderings for 
matrices arising from isotropic and anisotropic PDE’s discretized on a regular 
grid. Their study considered orderings based solely on the sparsity pattern of 
the matrix, and concluded that Reverse Cuthill-McKee (RCM) ordering [18] was, 
in general, a good choice. This ordering reduces the bandwidth of the matrix, 
which tends to increase the overlap of fill and hence reduce the effect of dropped 
terms in ILU factorization. Dutto [14] also considered sparsity pattern based 
orderings, this time on Jacobian matrices arising from the discrete Navier-Stokes 
equations on irregular grids. Her results coincided with those of Duff and Meu- 
rant, indicating that RCM ordering, or the related Gibbs ordering [19] were good 
choices. 

Recently, D’Azevedo, Forsyth, and Tang derived the Minimum Discarded 
Fill (MDF) and Minimum Update Matrix (MUM) orderings [7, 8, 9], which 
are sensitive to the matrix coefficients, as well as the sparsity pattern. The 
development of these orderings was prompted in part by the problem of highly 
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anisotropic PDE’s, whose discretization can lead to matrices for which the wrong 
ordering will produce very unsatisfactory preconditioners. The analysis leading 
to these techniques revealed that the most effective ordering for an anisotropic 
matrix follows the direction of the weakest connections in the graph. 

MDF ordering is capable of detecting anisotropy in a matrix graph, and ex- 
ploiting it to produce exceptionally good orderings. Its one drawback is that it is 
expensive to compute; the algorithm has a time complexity of roughly 0(Nd 3 ), 
where N is the number of matrix rows, and d is the average number of non-zeros 
in a matrix row. MUM ordering, which is an approximation of MDF, does not 
detect anisotropy when the fill level l is small, but has been shown to produce 
workable orderings even for difficult matrices. Unfortunately, it is also fairly 
expensive to compute. In [4, 3] MUM was tested against the Navier-Stokes equa- 
tions, and in [8, 26] both MUM and MDF were tested against problems arising 
from linear PDE s with moderately extreme (lOOCkl) anisotropy. 

The objective of this study is to implement and test the heuristic of following 
weak connections in the matrix graph to produce an ordering. This is done using 
algorithms which are considerably faster than MDF or MUM. The mathematical 
motivation for this heuristic will be outlined in more detail in Section 2. 

Ordering techniques will be outlined which are sensitive to both the matrix 
sparsity patterns and matrix coefficients. These attempt to follow anisotropy 
and hence improve level-based ILU factorizations. A modification to the RCM 
algorithm will be considered. The orderings tested are based on the standard 
graph theoretical algorithms for a minimum spanning tree (MST), and the single- 
source problem (SSP), on the matrix graph. The main attraction of the MST 
and SSP algorithms as anisotropy detectors is their speed: they have a time 
complexity of 0(M log (AT)) or better, (where N is the number of matrix rows, 
and M the number of off-diagonal non-zeros in either the upper or lower half of 
the matrix). The new ordering techniques are described in detail in Section 3 
A number of matrices taken from the discretization of linear PDE’s will serve 
as test cases. These test cases are outlined in Section 4. The numerical results 
of matrix solver runs on these matrices using each ordering will be presented in 
Section 5, and summarized in the final section. 



Weighted Graph Ordering for PCG Methods 


4 


2 The Motivation for Weighted Graph Based 
Techniques 

Consider the anisotropic PDE 

d / dU\ d (dU\ , , 

fc( K fa) + di[d^) = - q(x ’ y) ' (*.»)e(0,l)x(0,l) (1) 

with a Neumann boundary condition, I< = 1000, and discretized on a 30 x 30 
regular grid with a five-point molecule with h = 1/30 as the grid size. The right 
hand side q(x, y) was defined as 


q(x,y) 


1 if (x, y) = (0, 0), 

< -1 if (a, y) = (1,1), . 
0 elsewhere 


The resulting linear system (which is similar to cases arising from highly anisotropic 
convection-diffusion problems) was solved with the preconditioned conjugate gra- 
dients method using an ILU(l) preconditioner. A zero initial guess was used, and 
the matrix was solved to a reduction of 10 -12 in the l 2 norm of the residual. Ta- 
ble 1 shows the solution time when the matrix was ordered in two ways: natural 
x-y ordering, which numbered the nodes in the x direction first, and natural y-x 
ordering, which numbered the nodes in the y direction first. Theorem 1 will show 
why the incomplete factorization in the x-y direction was poorer, despite both 
preconditioners having the same level of fill, and number of fill entries. 

If a given matrix A is symmetric, the fill entries in the factor L can be conve- 
niently described through a graph model [27, 29]. Let the elimination sequence 
be Vi , . . . , v n and Q k — (V*, £ k ) be the graph of A k = 


Vfc — W+l, • • • , Vn} , £ k = {(u,-, Vj) | djp ^ o} 
where k is the step of the elimination. 

It can be shown [18] that there is a nonzero entry lij if and only if there exists 
a path (vi, Vij , . . . , v im , Vj) in the graph of A where 


V il • • • > V im ^ Vj-l} • 


The size of Uj is related to the size of entries on this path. 
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Theorem 1 Let A be an M-matrix and let (u t *, v ^,. . . , Vi m , Vj ) be a path in the 
graph A where 


V il ^ {^1 5 • • • , V j- 1} , 


then for i > j 


hj ^ 


Proof: 5ee [25] and [8]. 


•■ a ii 1 a i 1 i 2 ' • • a t m jl 
di x d{ 2 * • • d{ m dj 


dk 


akk • 


For the anisotropic problem 1, the resulting matrix is a symmetric M-matrix. 
All edges aligned along the ;r-axis have values 0(Kj K -j- 1), and edges aligned 
along the y- axis have values 0(1/ K). If the natural x-y row order is used, then 
all new fill entries will be oriented more in the x direction (see Figure 1). From 
the lower bound in Theorem 1 , the fill entries in the matrix will have a slow decay 
rate. Conversely, if the natural y-x ordering is used, the fill entries will have a 
more rapid decay rate. Thus the value of the fill using the y-x ordering will have 
less of a bearing on the quality of the preconditioner as the level of fill increases 
than the fill using the x-y ordering. 

In this study MDF ordering [7] and MUM ordering [8, 9] will be used as exam- 
ples of effective, matrix coefficient sensitive orderings. Both orderings attempt 
to minimize the amount of fill discarded by the incomplete factorization process. 
MDF uses a more accurate, and more expensive measure for the discarded fill, 
and is, as noted in the introduction, capable of detecting anisotropy. The reader 
is referred to the papers cited for the details about MDF and MUM, and earlier 
comparative studies using these algorithms. 

Reverse Cuthill-McKee ordering [18] will be used as a generally effective [13, 
14] matrix coefficient insensitive ordering which is quick to compute. Because we 
will be making a modification to this algorithm to render it coefficient sensitive, 
we briefly outline the algorithm in Figure 2. 


3 Weighted Graph Algorithms 

As noted above, the following techniques are all matrix coefficient sensitive, and 
based on well established algorithms. The following algorithms seek out the weak 
connections in the matrix graph, and attempt to produce an ordering consistent 
with the heuristic that follows from Theorem 1. 
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All of the techniques described below operate on the graph Q of the symmetric 
matrix A, where each node in the graph corresponds to a diagonal entry in the 
matrix. Weights are assigned to the edges between two nodes i and j by the 
absolute value of the matrix olf-diagonal | A{j |. 

A detailed description of minimum spanning tree and single source prob- 
lem algorithms may be found in [32], and many data structures textbooks. All 
of the ordering algorithms described below have an overall time complexity of 
G(M log N) or better. 

3.1 Modified RCM 

RCM, as it stands in Figure 2, can miss the anisotropy of a problem and produce 
a poor ordering (see Section 5, also [4]). Step 3b of the algorithm as given was 
modified so that un-numbered neighbors of a node were sorted by the weight of 
their connection strengths whenever their degree was equal. Since the ordering is 
reversed in the final stage, it was not clear whether an ascending, or descending 
connection strength order was appropriate, so both were tried. We will denote 
the modified RCM with degree tie breaking in ascending order as RCMa, and in 
descending order as RCMd- 

3.2 Minimum Spanning Tree with Three Versions 

Three ordering methods based on the minimum spanning tree (MST) of the 
matrix graph were tested. The MST on the matrix graph creates an acyclic 
subgraph using the smallest possible edge weights, and hence will select edges 
that connect nodes weakly. Consistent with the heuristic outlined in Section 2, 
these weak connections will be followed to attempt to produce a good ordering. 

Figure 3 outlines the MST-based algorithms. The first two steps are the same 
for all three variations. To construct the MST, we used Kruskal’s algorithm [32]. 
It was selected for its simplicity; faster, 0(M \og\og N) algorithms are available 
(where M is the number of graph edges). 

In the first variation on the algorithm, a root node is selected on the tree, and 
a depth-first search performed. The nodes are numbered in the order in which 
they are encountered in the tree, always choosing the weakest connections first at 
branching points. The depth-first search will thus tend to follow lines of weakly 
connected nodes, producing an ordering that follows the anisotropy in the desired 
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direction, and which keeps nodes grouped locally in a reasonably natural fashion. 
This ordering will be denoted MSTd. 

The second variation on the algorithm is a pre-processing step done before 
the MST is constructed to break ties in the depth-first search stage of MST D . 
A small (< off- diagonals) value is multiplied by the original node number, and 
(symmetrically) added to the row above (i.e. also to the column below) the 
diagonal entry corresponding to that node. This forces a slight bias in the matrix 
so that when the MST algorithm must decide between edges that were previously 
equally weighted, an edge will be selected that reflects the original ordering. A 
depth-first search is then performed on the tree. We will see that tie-breaking 
by natural ordering proves useful in situations of mixed anisotropy and isotropy. 
This variation of MST ordering will be denoted MSTq. 

The final variation on MST-based ordering computes the distance from the 
given root node to every node on the MST. The nodes are then sorted in order of 
this distance. This effectively produces level sets, much as with RCM [18], only 
biased in the desired direction of the anisotropy by the removal of strong edge 
connections. The Cuthill-McKee (CM) ordering (which RCM reverses) produces 
level sets that might also be thought of as nodes grouped by contours on the 
graph. This variation distorts the contours, using the MST as a measure of 
distance in the graph. This variation will be denoted MSTc- 

3.3 Single Source Problem with Contouring 

MST C ordering uses an inexact measure of distance from a given root node to 
the rest of the graph. The solution to the single source problem on a weighted 
graph produces the exact minimum weighted distance from a root node to all 
nodes in the graph. For anisotropic problems, nodes with weak connections to 
the root will appear “closer” to it than those with strong connections. With the 
single source problem solved, this ordering, as with MST C , sorts the nodes in 
order of distance from the root, again producing distorted level sets, or contour 
sets which follow the weakly connected direction of the anisotropy. This ordering 
will be denoted SSP. 

This ordering may also be seen as a Breadth-First Search traversal of the 
graph, using the weighted distance from the root node to determine the depth of 
each search level. 

The reader will note that given a graph with all equal weights, this will pro- 
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duce roughly Cuthiil-McKee ordering, the difference being that SSP ordering 
does not order based on node degree within a level set. SSP ordering is, by the 
contour set analogy, a skewed” variation on CM ordering. This in turn suggests 
that, as with RCM, reversing the SSP ordering might be beneficial. Thus we will 
also consider Reversed SSP, or RSSP ordering. 


4 Test Problems 


All of the ordering methods mentioned were tested against fourteen problems, 
eleven on a regular grid and three on simplicial grids. Each test case is different in 
some aspect of dimension, isotropy or anisotropy, direction of anisotropy, varying 
coefficients, and grid type. The problems are all based on the PDE 


d_ 

dx 





in two dimensions, and 


~<l( x iy), (x,y) <E (0,1) x (0,1) 


(2) 


d_ 

dx 




'dU\ d 

Ky ~dy) + dz 



-qfayiz), 


( 3 ) 


(x,y,z) e (0,1) x (0,1) x (0,1) 

in three dimensions. The usual five- or seven-point finite difference discretization 
was used on the regular grid problems, with an harmonic average used to deal 
with cases where the coefficients ( K x , K y or K z ) were discontinuous [1]. Despite 
the fact that a number of the problems produced only positive semi-definite 
matrices, the PCG method still converged. 

For the various regular grid problems the same number of points were used in 
each of the x, and y, (and 2 in three dimensions) directions. We define a discrete 
( x i,Vj,Zk) coordinate system for a grid with node spacing h = l/(A^ edge - 1), 

where N e d ge is the number of nodes along the edge of the unit square or cube, so 
that 


** = (i - 1 )h, y 3 = (j - 1 )h, z k = (Jc- 1 )h, 1 <i,j,k< iV edge . 

This will be used in the definition of the regular grid problems. The term 
<ld(xi,yj,z k ) denotes the discretized right hand side, and is zero unless other- 
wise noted. The unit square or cube region will be denoted R, with boundary 
SR, as required. 
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The three problems on irregular grids are too complicated to fully describe 
here. The reader is referred to [16] for a more detailed definition of the RE- 
FINE2D and FE2D problems, and [22] for the FE3D problem. 

We surmised, based on the results of MDF(O) ordering (see Figures 5 through 
7), that grouping regions of similar permitivity on regular grid problems was 
the correct thing to do. The magnitude of matrix row off- diagonals, not merely 
their difference, might be useful to ensure that those differently weighted blocks 
were ordered together. For the irregular grid problems, where the weighting 
of the edges varies gradually, such a particular approach to grouping was not 
evident. Preliminary testing revealed that regular grid matrices were indeed 
more effectively ordered if they were not scaled, hence preserving the magnitude 
information for blocks on the grid. The matrices from irregular grid problems 
were more quickly solved if they were first symmetrically scaled so that their 
diagonal entries were equal to one. Thus all the following testing will be done on 
the unsealed regular grid, and scaled irregular grid matrices. 


4.1 Problem 1. LAP5D 

This first problem is the two dimensional Laplace’s equation on the unit square 
with Neumann boundary conditions and five point sources, discretized on a reg- 
ular 30 X 30 grid. It is similar to that used in [13, 8]. 

4.2 Problem 2. BIG1DIR 

This problem is similar to that presented as Equation 1 earlier in the paper. A 
single, fairly strong anisotropy defines the problem. 

Parameters: 2 Dimensions, Regular Grid, Neumann BC’s 

A e dge = 30 
(K x ,K y ) = (1000,1) 

Source terms qd{xi,yj) defined as in LAP5D. 

4.3 Problem 3. VDVORST 

Tested in [26], this problem from [10] also exhibits fairly extreme anisotropy over 
the entire region, but with variations in the coefficient strengths. 
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Parameters: 2 Dimensions, Regular Grid, Dirichlet BC’s 


(K X ,Ky) = < 
?(* ,y) 


^edge = 41 

(100,0.01) in (0.25,0.75) x (0.25,0.75) 
(1,0.0001) elsewhere 

( 100 in (1/4, 3/4) x (1/4, 3/4) 

0 elsewhere 


U = 0 on £R 


4.4 Problem 4. STONE 

Stone’s third problem [30] presents a large isotropic region, with inset isotropic 
and anisotropic blocks of different orientations and magnitudes. See [30, 8] for 
the exact specification of this problem, which is two dimensional, and on a 31 x 31 
regular grid. 


4.5 Problem 5. ANISO 

The region in this problem is completely anisotropic, but with four distinct re- 
gions in two directions. The ratio of K x to K y was 100:1 in each region. It is 
defined on a two dimensional, 30 x 30 regular grid. For exact specifications, see 
[ 8 ]. 


4.6 Problem 6. LAP7D 

This problem is the three dimensional Laplace’s equation on the unit cube with 
Neumann boundary conditions and five point sources, discretized on a 30 x 30 x 30 ‘ 
regular grid. 

4.7 Problem 7 & 8. BIG1DIR3E and BIG1DIR3F 

These two problems have uniform coefficients that give them strong directions of 
anisotropy. Two directions were used hoping to differentiate between the methods 
that are incapable of detecting anisotropy. 

Parameters: 3 Dimensions, Regular Grid, Neumann BC’s 


Aedge = 30 
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(K XJ K yt K z ) 


(100,1,1000) for BIG1DIR3E 
(1000,100,1) for BIG1D1R3F 


Source terms qd(xi,yj,z k ) defined as in LAP7D. 


4.8 Problem 9. STONE3D 

This is a three dimensional version of the 2D STONE problem. It was formed 
by projecting the blocks defined by STONE into three dimensions using the 2D 
pattern for the z ranges. 

Parameters: 3 Dimensions, Regular Grid, Neumann BC’s 

Nedge = 31 

(1,100,100) for (xi, yj , Zk) 15 < i < 31, 1 < j < 17, 1 < k < 17 
(100, 1, 1) for (a;*, yj , z k ) 6 < i < 13, 6 < j < 13, 6 < k < 13 
(0, 0, 0) for (xi, y h z k ) 13 < i < 20, 22 < j < 29, 22 < k < 29 

(1,1,1) (xi, yi , z k ) elsewhere 

^(4,4,4) = 1, gd(4, 28, 28) — 0.5, q d { 24, 5, 5) = 0.6 

^(15, 16, 16) = -1.83, ^(28,28,28) = -0.27 


(K x ,K y ,K z ) = 


4.9 Problem 10 & 11. ANIS03E and ANIS03F 

These are three dimensional versions of the ANISO problem. Six blocks are 
defined so that abutting regions have different strong directions of anisotropy. 
Again, two variations are defined, ANIS03E showing only one distinct direction 
in each block. 

Parameters: 3 Dimensions, Regular Grid, Neumann BC’s 

A edge = 30 

' (100,1, A*) in (0,1/2) x (0,1/2) x (0,1/2) 
and (1/2,1) x (1/2,1) x (0,1/2) 
(K x ,K y ,K z )= < (K v , 1,100) in (1/2, 1) x (0,1/2) x (1/2,1) 

and (0,1/2) x (1/2,1) x (1/2,1) 

(100, K v ,l) elsewhere 
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f 100 for ANIS03E 
v ” { 1000 for ANIS03F 

Source terms qd(xi,yj, z k ) defined as in LAP7D. 

4.10 Problem 12. REFINE2D 

A finite element method using linear triangular basis functions was used to dis- 
cretize this problem. In this example, K x and K y are constant. Grid refinement 
was applied, and the final triangulation was such that the resulting matrix was 
an M-matrix. 

4.11 Problem 13. FE2D 

A finite element method using linear triangular basis functions was also used for 
this problem. However, K x and K y varied by four orders of magnitude. The grid 
was defined by constructing a distorted quadrilateral grid, and then triangulating 
in an obvious manner. A Delaunay- type edge swap was used to produce an M- 
matrix. 

4.12 Problem 14. FE3D 

This problem is a three-dimensional version of FE2D. A finite element discretiza- 
tion was used, with linear basis functions defined on tetrahedra. The coefficients 
{K x , K y , K z ) vary eight orders of magnitude (this model was derived from actual 
field data from a groundwater flow experiment). The nodes were defined on a 
25 X 13 x 10 grid (3250 nodes) of distorted hexahedra, which were then divided 
into tetrahedra. The resulting matrix was not an M-matrix, and the average node 
connectivity was fifteen. In general it is not possible, for a given node placement, 
to obtain an M-matrix in three-dimensions if linear tetrahedral elements are used 
[23]. 

5 Numerical Results 

The numerical experiments were run on a Sun 4/670 server (nominally rated at 
4 MFLOPS) using double precision arithmetic. The convergence criterion 

||r*|| 2 < <oZ||r°l| 2 , tol = 10~ 12 
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was used, where r k was the residual vector after the k th iteration of conjugate 
gradient acceleration. In all cases the initial guess was chosen to be the zero 
vector. 

A note concerning CPU times is in order. The mechanism provided in Sun 
FORTRAN for computing CPU times tends to be out as much as 10% between 
runs of the same test, and provides an accuracy of only 0.01 of a CPU second. The 
reader should keep this margin of error in mind while interpreting the following 
results. All results are for CPU time required only for the program execution, 
and are reported in seconds. 


5.1 Ordering Time 

Table 2 lists the time required to perform the various orderings for a few of the 
problems. The time to perform RCM ordering is given, and the other ordering 
times are scaled by that value for each problem. On average, performing RCM 
ordering accounted for between 2 and 4% of the overall solution time. The time 
required for MDF varies considerably depending on the fill level requested for the 
calculation, so MDF for level 1 fill is given. 

Note from these results that the graph based orderings take roughly 1 to 4 
times longer to produce than RCM, which is considerably less than the 10 to 21 
times longer for MUM, and 28 to 191 times longer for MDF. Note that RCM A and 
RCM d take, on average, the same amount of time to compute, so the numbers 
for RCMd are not shown. 

5.2 Solution Time 

One test run was made for each of the ordering methods, on each of the fourteen 
test matrices, for preconditioners using ILU(O), ILU(l) and ILU(2). Of particular 
interest in analyzing the results are the number of iterations required to solve 
the problem to the desired tolerance, the amount of fill produced in the ILU 
factorization, and the total time required for the iterative solve. Since the last 
measurement is a function of the first two, only the iterative solve times will be 
presented in detail. 

Natural (default node order) ordering, was generally worse than RCM in 2D, 
and on irregular grids, producing marginally better solution times on 3D regular 
grids. Results with MUM were mixed, however, MUM generally does best on 
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problems with large computational molecules [8, 9], and the problems posed here 
have relatively small molecules. MDF proved again to be a good ordering in most 
cases. 

The weighted graph based orderings performed best at ILU(l). Table 3 shows 
the time required for the all the iterations required to solve the linear system, 
using each of the ordering methods, at this level of fill. Because RCM is popular, 
and often viewed as the best matrix coefficient insensitive ordering [13, 14], we 
have scaled all the solution times by the value for RCM ordering for that matrix, 
giving only the solution time for RCM ordering in CPU seconds. At ILU(O), 
weighted graph methods generally performed as well as RCM on regular grid 
problems, and significantly worse on irregular grids. At ILU(2), the MST and 
SSP based orderings fared somewhat poorly. 

At ILU(l), the MST based algorithms ran significantly faster than RCM 
in all two-dimensional, regular grid cases where the regions were completely 
anisotropic. Tie breaking (MST^) was required to produce good results in the 
two 3D problems where one direction of anisotropy was dominant. Tie break- 
ing also was required to produce adequate results for STONE, which had mixed 
anisotropic and isotropic regions, but the result was still slower than RCM. 

Table 3 shows that SSP and RSSP ordering produced significant improvements 
over RCM only when the region was anisotropic in a single direction. Unlike 
the MST algorithms, the SSP orderings could not follow the sharp changes in 
anisotropic direction of the ANISO problem (see next subsection). 

The RCM a ordering showed favourable results. At worst it caused a 30% 
slower solution over RCM, and that in only one case at ILU(l). It generally 
produced solutions taking 47% to 108% the time of basic RCM, and did better 
on irregular grids, and regions that were all anisotropic in a single direction. 
Comparing the BIG1DIR3E and BIG1DIR3F problems, RCM A produced a solu- 
tion in the same time for both, whereas basic RCM failed to follow the direction 
of anisotropy of the latter case, and produced a poor ordering. At ILU(O), and 
ILU(2), RCM a was never worse than 12% slower than RCM, and slightly faster in 
almost half the cases. RCMd fared somewhat poorly, and was never significantly 
faster than RCM A , hence its timing results are not shown. 

We note in passing that, in the preliminary testing for this study, different 
root node placements were tried for the MST and SSP orderings. This was shown 
to have little effect on the solution time. 
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5.3 Drop Tolerance Preconditioning 

Also noteworthy is the application of these algorithms to drop tolerance precon- 
ditioning [6], where matrix fill is discarded based on its magnitude. With drop 
tolerance, a good ordering will speed computation by causing the magnitude of 
the fill to decay more quickly, leading to fewer fill entries and faster preconditioner 
matrix multiplies. 

Tests were performed dropping preconditioner fill entries \a {j \ < e RowMax* 
where | RowMax 1 | is the maximum magnitude entry in row i , and e is the 
drop tolerance parameter. Solutions with e = 0.01 produced good results for 
well conditioned problems, and e = 0.001 was required for more difficults ones. 
MST C ordering, and SSP ordering produced the best results at these levels, 
reducing solution time (compared to RCM) by an average of 26%. Weighted 
RCM orderings generally did not outperform plain RCM. 

Drop tolerance methods can be defined in many different ways (our matrix 
solver package defines eight), thus it is a little more difficult to analyze the 
outcome of the tests. The exact number of fill entries is not determined by the 
graph of the problem, adding another level of complexity to the analysis. Previous 
experience [6] has show that the parameters tend to be problem dependant and 
the results hard to generalize. Thus we feel that a detailed presentation of these 
results is not justified. 

5.4 Some Ordering Pictures 

Using MATLAB [24], a number of pictures of graph orderings are given in Figures 
5, 6 and 7. In this visualization technique, nodes numbered first appear as darker 
squares, and the nodes numbered last appear as the lightest squares. 

Figure 5 shows the orderings produced by MDF(0), MST D , and SSP on the 
VDVORST problem. Notice how MDF(0) picks up the interior region of differing 
coefficients. The other orderings pick up only on the single direction of weak 
connections, producing essentially the same pictures. 

Figure 6 shows the MDF(0), MST D , and SSP orderings on the ANISO prob- 
lem. Note the similarity between the MDF(0) and MST orderings, both detected 
and followed the changes in the direction of anisotropy. Note the SSP order- 
ing tended to order the middle and work outward, missing the basic anisotropy. 
(Note that the two anomalously ordered blocks were caused by an unavoidable 
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tie-breaking problem inherent in the binary heap used to compute the Single 
Source Problem.) 

Figure 7 shows the MDF(O), MST D , and MST£ orderings on the STONE 
problem. While MDF identifies and orders the various blocks in the test, the MST 
routines fail to identify them clearly. Thus we see a weakness in the MST routines 
when isotropic and anisotropic regions exist in the same problem. However, if the 
subregions of anisotropy are known in advance, MST ordering could potentially 
be applied to those individual subregions, then the results linked to produce a 
final ordering. MDF(O) is essentially doing this for the STONE problem. 

6 Summary 

We have presented and tested seven new matrix ordering techniques which are 
sensitive to the coefficients, as well as the sparsity pattern, of matrices. These 
techniques attempt to implement the heuristic, based on Theorem 1, that or- 
dering along lines of weakly connected nodes results in an improved incomplete 
factorization for PCG methods. 

MDF ordering again proved to be very good, however, it is expensive to com- 
pute. It would be the most useful if a large number of similar matrix problems 
were to be solved which could efficiently exploit a single MDF ordering compu- 
tation. 

Methods based on the minimum spanning tree and single source problem 
only showed significant advantage over RCM on two-dimensional regions that 
were entirely anisotropic, with one level of fill in the preconditioner. The single 
source problem based techniques were unable to follow changes in the direction 

of anisotropy, and hence were only advantageous when no changes in direction 
were present. 

The modified RCM technique RCM A proved to be generally better than RCM, 
and rarely significantly worse. RCM A performed consistently well on irregular 
grids, and for all levels of fill. We conclude that RCM A is, in general, a good 
choice over plain RCM. 

A number of questions surrounding good matrix ordering for ILU precondi- 
tioners remain to be solved. Good results have been obtained for systems of 
equations using block ordering [15, 4], but more work needs to be done in this 
area. Also, investigations into ordering based on eigenvalue computations, called 
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spectral ordering (and closely related to [28]), are being undertaken. 
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Figure 1: Orientation of new fill in x-y and y-x natural orderings. New fill is indicated by 
dotted lines. 
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Figure 2: The Reverse Cuthi 11- McKee (RCM) ordering algorithm 

1. Determine a starting pseudo-peripheral node K of graph Q . 

2. Number 1Z first in the ordering. 

3. For i = 1 . . . (Number of Nodes) do (following node ordering) 

3A. U = { ALL UN-NUMBERED NEIGHBORS OF NODE i} 

3b. Number elements of U in order of node degree 

ENDDO 

4. Reverse the ordering determined in stage 3. 


Table 1: Solution time for an anisotropic problem with two orderings. 


Ordering 

Solution 
Time (s) 

Iterations 

Natural x-y 

1.13 

35 

Natural y-x 

0.43 

7 
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Figure 3: The Minimum Spanning Tree based ordering algorithms 

1. Construct the minimum weight spanning tree T of Q. 

2. Select a root node 71 of graph Q. 

Variant 1: MSTd 

3. Perform a depth-first search (pre-order traversal) of T 

STARTING FROM 71 . At EACH BRANCH SELECT THE MINIMUM 
WEIGHT EDGE FIRST. 

4. Number the nodes in the order they were encountered in Step 3. 
Variant 2: MSTj 

3. Using the original ordering, add some factor e times the node 

NUMBER TO THE WEIGHTS OF EDGES CONNECTING NODE i WITH NODE j WHERE 
i < j IN THE ORIGINAL ORDER, (e <C THE OFF DIAGONAL WEIGHTS). 

4. Perform a depth-first search (pre-order traversal) of T 

STARTING FROM 71 . At EACH BRANCH SELECT THE MINIMUM 
WEIGHT EDGE FIRST. 

5. Number the nodes in the order they were encountered in Step 4. 
Variant 3: MSTc 

3. Compute the (weighted) distance from 71 to each node of T. 

4. Order the nodes starting with 71 , then in order of shortest 

TO LONGEST DISTANCE FROM 71 ALONG T. 


Figure 4: The Single-Source Problem based ordering algorithm 

1. Select a root node 71 of graph Q. 

2 . Compute the single-source problem from 71 for the entire 
graph Q. This associates the minimum weighted distance 

FROM 71 TO EACH NODE IN THE GRAPH Q. 

4. Order the nodes starting with 71 , then in order of shortest 
to longest distance FROM 71 ON Q. 
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Table 2: Time Required to Perform Ordering 



— — — — 

Ordering Method 


(CPU s) 

Scaled by RCM 

Problem 

RCM 

MST d 

MSTf) 

MST C 

SSP 

RSSP 

RCM A 

RCM d 

MDF 

MUM 

LAPD5 

0.03 

1.7 

3.3 

3.0 

1.3 

2.0 

2.0 

1.7 

32.7 

21.3 

ANISO 

0.03 

2.0 

4.0 

3.7 

2.0 

2.0 

2.3 

2.7 

31.0 

15.7 

LAPD7 

1.85 

2.2 

2.6 

2.7 

1.5 

1.6 

1.5 

1.4 

70.4 

10.5 

STONE3D 

2.13 

2.2 

3.3 

3.1 

1.5 

1.6 

1.4 

1.4 

69.5 

9.8 

FE2D 

0.09 

3.3 

3.2 

3.9 

1.6 

1.6 

1.2 

1.1 

27.8 

10.7 

FE3D 

0.53 

2.8 

3.1 

2.9 

1.0 

1.0 

1.0 

1.0 

191.3 

12.1 


Table 3: Time Required to Perform All PCG Solver Iterations (at ILU(l) ) 


Problem 

Ordering Method 

(CPU s) 

Scaled by RCM 

RCM 

MST d 

MSTi 

MST C 

SSP 

RSSP 

RCM a 

Nat 

MDF 

MUM 

LAPD5 

0.71 

1.01 

1.01 

1.00 

1.08 

1.06 

0.99 

1.06 

0.92 

1.07 

BIG1DIR 

0.73 

0.92 

0.45 

0.51 

0.47 

0.44 

0.47 

1.07 

0.41 

0.97 

VDVORST 

0.76 

0.54 

0.57 

0.62 

0.53 

0.55 

0.59 

1.01 

0.33 

1.01 

STONE 

0.77 

1.86 

1.17 

2.23 

1.29 

1.26 

1.03 

1.04 

0.84 

1.21 

ANISO 

0.71 

0.66 

0.68 

1.24 

1.66 

1.20 

1.08 

1.10 

0.80 

1.42 

LAPD7 

62.81 

0.90 

1.05 

1.10 

1.07 

1.05 

1.07 

0.92 

0.89 

1.01 

BIG1DIR3E 

131.95 

2.06 

0.89 

0.90 

0.88 

0.86 

0.97 

0.97 

0.84 

1.64 

BIG1DIR3F 

231.29 

1.20 

0.58 

0.58 

0.57 

0.61 

0.55 

0.87 

0.49 

0.85 

ANIS03E 

90.40 

0.97 

1.03 

1.15 

1.16 

1.09 

0.98 

0.93 

0.80 

0.91 

ANIS03F 

104.54 

1.32 

1.29 

1.49 

1.67 

1.41 

1.30 

0.83 

0.81 

1.30 

STONE3D 

114.47 

1.00 

1.16 

1.27 

1.02 

1.05 

1.06 

0.92 

0.74 

0.87 

REFINE2D 

1.54 

2.39 

2.64 

2.58 

2.01 

1.45 

0.97 

3.42 

1.04 

1.29 

FE2D 

1.48 

3.32 

2.63 

2.70 

2.24 

1.82 

0.94 

1.95 

0.89 

1.47 

FE3D 

7.63 

1.15 

1.23 

1.01 

0.92 

0.72 

0.77 

1.09 

0.88 

0.73 


Note: This table lists the time to complete all PCG iterations required to solve the problem 
to tol = 10 , where tol is the G norm of the linear solution residual. RCM times are given 

in CPU seconds. Other ordering times are scaled by the time required to solve the problem 
using RCM ordering. 












